Skip to content

feat(rag): Ask this book — conversational streaming chat + gpt-4.1-mini#385

Merged
mrviduus merged 1 commit into
mainfrom
rag-chat-upgrade
Jun 19, 2026
Merged

feat(rag): Ask this book — conversational streaming chat + gpt-4.1-mini#385
mrviduus merged 1 commit into
mainfrom
rag-chat-upgrade

Conversation

@mrviduus

Copy link
Copy Markdown
Owner

Turns the single-turn FAQ-style Ask into a real chat — grounding + citations + spoiler gate unchanged.

  • Model: rag.askgpt-4.1-mini (dedicated openai-rag provider, mirrors openai-explain).
  • Companion prompt: warm + conversational, handles greetings (no more cold 'the excerpts do not contain hi'); grounding is STRICT and overrides everything — every book-fact (incl. themes/symbolism/interpretation) must cite [n]; greeting/meta the only no-cite case.
  • Multi-turn memory: history (client-held, server-clamped last 6 turns / 4000 chars) → real LlmMessage[]; retrieval per latest question.
  • SSE streaming (content-negotiated, copies Explain): delta* → terminal done {citations,lastReadOrd,insufficient}; empty chunks → friendly delta + done, no model call (spoiler/hallucination short-circuit on both paths). JSON fallback unchanged. Both catalog + user-book.
  • Web: token-by-token render + caret, suggested starters on empty thread, history sent, abort-safe. Mobile keeps JSON.

architect → backend + web → adversarial QA (SHIP) — closed a grounding loophole (interpretation/themes weren't bound) + added empty-chunk-no-LLM-call tests. 878 unit + 564 web green; solution + web build clean.

⚠️ The companion prompt loosened the cold-refusal rule — re-run the paid grounding/citation eval on /ai-quality post-deploy (only runtime confirmation of the tightened prompt).

🤖 Generated with Claude Code

Turns the single-turn FAQ into a real chat (grounding + citations + spoiler gate
unchanged).
- Model: rag.ask -> gpt-4.1-mini (dedicated openai-rag provider, mirrors
  openai-explain; translate/podcast stay nano).
- Companion system prompt: warm, conversational, handles greetings; grounding is
  STRICT and overrides everything (covers themes/symbolism/interpretation, not
  just plot) -> every book-fact cites [n]; greeting/meta the only no-cite case.
- Multi-turn memory: AskRequest.History (client-held, server-clamped last 6
  turns / 4000 chars each); real LlmMessage[] (system -> excerpts -> history ->
  question); retrieval still per latest question. Eval call passes [].
- SSE streaming (content-negotiated, copies Explain): delta* then terminal done
  {citations,lastReadOrd,insufficient}; empty chunks -> friendly delta + done,
  NO model call (spoiler/hallucination short-circuit, both paths). JSON fallback
  unchanged. Both catalog + user-book ask. MaxOutputTokens 320->400.
- Web: streaming token render + blinking caret, suggested starters on empty
  thread, last-6 history sent, abort on new-question/unmount. Mobile keeps JSON.

architect -> backend+web -> adversarial QA (SHIP; closed grounding loophole +
added empty-chunk-no-LLM-call tests). 878 unit + 564 web green.
NOTE: re-run the paid grounding/citation eval on /ai-quality post-deploy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mrviduus mrviduus merged commit c80bbd0 into main Jun 19, 2026
5 checks passed
@mrviduus mrviduus deleted the rag-chat-upgrade branch June 19, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant